<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
  PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1">
  <title id="kt20941">gpexpand</title>
  <body>
    <p>Expands an existing Greenplum Database across new hosts in the system.</p>
    <section id="section2">
      <title>Synopsis</title>
      <codeblock>gpexpand [{-f|--hosts-file} &lt;hosts_file>]
        | {-i|--input} &lt;input_file> [-B &lt;batch_size>]
        | [{-d | --duration} &lt;hh:mm:ss> | {-e|--end} '&lt;YYYY-MM-DD hh:mm:ss>'] 
          [-a|-analyze] 
          [-n  &lt;parallel_processes>]
        | {-r|--rollback}
        | {-c|--clean}
        [-v|--verbose] [-s|--silent]
        [{-t|--tardir} &lt;directory> ]
        [-S|--simple-progress ]

gpexpand -? | -h | --help 

gpexpand --version</codeblock>
    </section>
    <section id="section3">
      <title>Prerequisites</title>
      <ul>
        <li id="kt137989">You are logged in as the Greenplum Database superuser
            (<codeph>gpadmin</codeph>).</li>
        <li id="kt136978">The new segment hosts have been installed and configured as per the
          existing segment hosts. This involves:<ul id="ul_att_m1q_n4">
            <li id="kt138636">Configuring the hardware and OS</li>
            <li id="kt138640">Installing the Greenplum software</li>
            <li id="kt138641">Creating the <codeph>gpadmin</codeph> user account</li>
            <li id="kt138642">Exchanging SSH keys. </li>
          </ul></li>
        <li id="kt137052">Enough disk space on your segment hosts to temporarily hold a copy of your
          largest table. </li>
        <li>When redistributing data, Greenplum Database must be running in production mode.
          Greenplum Database cannot be running in restricted mode or in master mode. The
            <codeph>gpstart</codeph> options <codeph>-R</codeph> or <codeph>-m</codeph> cannot be
          specified to start Greenplum Database. </li>
      </ul>
      <note>These utilities cannot be run while <codeph>gpexpand</codeph> is performing segment
        initialization. <ul id="ul_c3w_wdp_lgb">
          <li><codeph>gpbackup</codeph></li>
          <li><codeph>gpcheckcat</codeph></li>
          <li><codeph>gpconfig</codeph></li>
          <li><codeph>gppkg</codeph></li>
          <li><codeph>gprestore</codeph></li>
        </ul></note>
      <note type="important">When expanding a Greenplum Database system, you must disable Greenplum
        interconnect proxies before adding new hosts and segment instances to the system, and you
        must update the <codeph>gp_interconnect_proxy_addresses</codeph> parameter with the
        newly-added segment instances before you re-enable interconnect proxies. For information
        about Greenplum interconnect proxies, see <xref
          href="../../admin_guide/managing/proxy-ic.html" format="html" scope="external">Configuring Proxies for the Greenplum Interconnect</xref>.</note>
      <p>For information about preparing a system for expansion, see <xref
          href="../../admin_guide/expand/expand-main.html" format="html" scope="external">Expanding a Greenplum
          System</xref><ph otherprops="op-print"> in the <cite>Greenplum Database Administrator
            Guide</cite></ph>.</p>
    </section>
    <section id="section4">
      <title>Description</title>
      <p>The <codeph>gpexpand</codeph> utility performs system expansion in two phases: segment
        instance initialization and then table data redistribution. </p>
      <p>In the initialization phase, <codeph>gpexpand</codeph> runs with an input file that
        specifies data directories, <varname>dbid</varname> values, and other characteristics of the
        new segment instances. You can create the input file manually, or by following the prompts
        in an interactive interview.</p>
      <p>If you choose to create the input file using the interactive interview, you can optionally
        specify a file containing a list of expansion system hosts. If your platform or command
        shell limits the length of the list of hostnames that you can type when prompted in the
        interview, specifying the hosts with <codeph>-f</codeph> may be mandatory. </p>
      <p>In addition to initializing the segment instances, the initialization phase performs these
        actions:</p>
      <ul>
        <li id="kt138259">Creates an expansion schema named <i>gpexpand</i> in the postgres database
          to store the status of the expansion operation, including detailed status for tables.</li>
      </ul>
      <p>In the table data redistribution phase, <codeph>gpexpand</codeph> redistributes table data
        to rebalance the data across the old and new segment instances.</p>
      <note>Data redistribution should be performed during low-use hours. Redistribution can be
        divided into batches over an extended period.</note>
      <p>To begin the redistribution phase, run <codeph>gpexpand</codeph> with either the
          <codeph>-d</codeph> (duration) or <codeph>-e</codeph> (end time) options, or with no
        options. If you specify an end time or duration, then the utility redistributes tables in
        the expansion schema until the specified end time or duration is reached. If you specify no
        options, then the utility redistribution phase continues until all tables in the expansion
        schema are reorganized. Each table is reorganized using <codeph>ALTER TABLE</codeph>
        commands to rebalance the tables across new segments, and to set tables to their original
        distribution policy. If <codeph>gpexpand</codeph> completes the reorganization of all
        tables, it displays a success message and ends.</p>
      <note>This utility uses secure shell (SSH) connections between systems to perform its tasks.
        In large Greenplum Database deployments, cloud deployments, or deployments with a large
        number of segments per host, this utility may exceed the host's maximum threshold for
        unauthenticated connections. Consider updating the SSH <codeph>MaxStartups</codeph> and
          <codeph>MaxSessions</codeph> configuration parameters to increase this threshold. For more
        information about SSH configuration options, refer to the SSH documentation for your Linux
        distribution.</note>
    </section>
    <section id="section5">
      <title>Options</title>
      <parml>
        <plentry>
          <pt>-a | --analyze</pt>
          <pd>Run <codeph>ANALYZE</codeph> to update the table statistics after expansion. The
            default is to not run <codeph>ANALYZE</codeph>.</pd>
        </plentry>
        <plentry>
          <pt>-B <varname>batch_size</varname></pt>
          <pd>Batch size of remote commands to send to a given host before making a one-second
            pause. Default is <codeph>16</codeph>. Valid values are 1-128.</pd>
          <pd>The <codeph>gpexpand</codeph> utility issues a number of setup commands that may
            exceed the host's maximum threshold for unauthenticated connections as defined by
              <codeph>MaxStartups</codeph> in the SSH daemon configuration. The one-second pause
            allows authentications to be completed before <codeph>gpexpand</codeph> issues any more
            commands. </pd>
          <pd>The default value does not normally need to be changed. However, it may be necessary
            to reduce the maximum number of commands if <codeph>gpexpand</codeph> fails with
            connection errors such as <codeph>'ssh_exchange_identification: Connection closed by
              remote host.'</codeph></pd>
        </plentry>
        <plentry>
          <pt>-c | --clean</pt>
          <pd>Remove the expansion schema.</pd>
        </plentry>
        <plentry>
          <pt>-d | --duration <varname>hh:mm:ss</varname></pt>
          <pd>Duration of the expansion session from beginning to end.</pd>
        </plentry>
        <plentry>
          <pt>-e | --end '<varname>YYYY-MM-DD hh:mm:ss</varname>'</pt>
          <pd>Ending date and time for the expansion session.</pd>
        </plentry>
        <plentry>
          <pt>-f | --hosts-file <varname>filename</varname></pt>
          <pd>Specifies the name of a file that contains a list of new hosts for system expansion.
            Each line of the file must contain a single host name. </pd>
          <pd>This file can contain hostnames with or without network interfaces specified. The
              <codeph>gpexpand</codeph> utility handles either case, adding interface numbers to end
            of the hostname if the original nodes are configured with multiple network interfaces.
            <note>The Greenplum Database segment host naming convention is <codeph>sdwN</codeph>
              where <codeph>sdw</codeph> is a prefix and <codeph>N</codeph> is an integer. For
              example, <codeph>sdw1</codeph>, <codeph>sdw2</codeph> and so on. For hosts with
              multiple interfaces, the convention is to append a dash (<codeph>-</codeph>) and
              number to the host name. For example, <codeph>sdw1-1</codeph> and
                <codeph>sdw1-2</codeph> are the two interface names for host <codeph>sdw1</codeph>.
            </note></pd>
          <pd>For information about using a hostname or IP address, see <xref href="#topic1/host_ip"
              format="dita"/>. Also, see <xref href="#topic1/multi_nic" format="dita"/>.</pd>
        </plentry>
        <plentry>
          <pt>-i | --input <varname>input_file</varname></pt>
          <pd>Specifies the name of the expansion configuration file, which contains one line for
            each segment to be added in the format of:</pd>
          <pd>
            <varname>hostname|address|port|datadir|dbid|content|preferred_role</varname>
          </pd>
        </plentry>
        <plentry>
          <pt>-n <varname>parallel_processes</varname></pt>
          <pd>The number of tables to redistribute simultaneously. Valid values are 1 - 96.</pd>
          <pd>Each table redistribution process requires two database connections: one to alter the
            table, and another to update the table's status in the expansion schema. Before
            increasing <codeph>-n</codeph>, check the current value of the server configuration
            parameter <codeph>max_connections</codeph> and make sure the maximum connection limit is
            not exceeded.</pd>
        </plentry>
        <plentry>
          <pt>-r | --rollback</pt>
          <pd>Roll back a failed expansion setup operation. </pd>
        </plentry>
        <plentry>
          <pt>-s | --silent</pt>
          <pd>Runs in silent mode. Does not prompt for confirmation to proceed on warnings.</pd>
        </plentry>
        <plentry>
          <pt>-S | --simple-progress</pt>
          <pd>If specified, the <codeph>gpexpand</codeph> utility records only the minimum progress
            information in the Greenplum Database table <i>gpexpand.expansion_progress</i>. The
            utility does not record the relation size information and status information in the
            table <i>gpexpand.status_detail</i>. </pd>
          <pd>Specifying this option can improve performance by reducing the amount of progress
            information written to the <i>gpexpand</i> tables.</pd>
        </plentry>
        <plentry>
          <pt>[-t | --tardir] <varname>directory</varname></pt>
          <pd>The fully qualified path to a <varname>directory</varname> on segment hosts where the
              <codeph>gpexpand</codeph> utility copies a temporary tar file. The file contains
            Greenplum Database files that are used to create segment instances. The default
            directory is the user home directory.</pd>
        </plentry>
        <plentry>
          <pt>-v | --verbose</pt>
          <pd>Verbose debugging output. With this option, the utility will output all DDL and DML
            used to expand the database.</pd>
        </plentry>
        <plentry>
          <pt>--version</pt>
          <pd>Display the utility's version number and exit.</pd>
        </plentry>
        <plentry>
          <pt>-? | -h | --help</pt>
          <pd>Displays the online help.</pd>
        </plentry>
      </parml>
    </section>
    <section id="host_ip">
      <title>Specifying Hosts using Hostnames or IP Addresses</title>
      <p>When expanding a Greenplum Database system, you can specify either a hostname or an IP
        address for the value. <ul id="ul_zsd_cmh_dmb">
          <li>If you specify a hostname, the resolution of the hostname to an IP address should be
            done locally for security. For example, you should use entries in a local
              <codeph>/etc/hosts</codeph> file to map a hostname to an IP address. The resolution of
            a hostname to an IP address should not be performed by an external service such as a
            public DNS server. You must stop the Greenplum system before you change the mapping of a
            hostname to a different IP address.</li>
          <li>If you specify an IP address, the address should not be changed after the initial
            configuration. When segment mirroring is enabled, replication from the primary to the
            mirror segment will fail if the IP address changes from the configured value. For this
            reason, you should use a hostname when expanding a Greenplum Database system unless you
            have a specific requirement to use IP addresses.</li>
        </ul></p>
      <p>When expanding a Greenplum system, <codeph>gpexpand</codeph> populates <xref
          href="../../ref_guide/system_catalogs/gp_segment_configuration.xml"
          >gp_segment_configuration</xref> catalog table with the new segment instance information.
        Greenplum Database uses the <codeph>address</codeph> value of the
          <codeph>gp_segment_configuration</codeph> catalog table when looking up host systems for
        Greenplum interconnect (internal) communication between the master and segment instances and
        between segment instances, and for other internal communication.</p>
    </section>
    <section id="multi_nic">
      <title>Using Host Systems with Multiple NICs</title>
      <p>If host systems are configured with multiple NICs, you can expand a Greenplum Database
        system to use each NIC as a Greenplum host system. You must ensure that the host systems are
        configured with sufficient resources to support all the segment instances being added to the
        host. Also, if you enable segment mirroring, you must ensure that the expanded Greenplum
        system configuration supports failover if a host system fails. For information about
        Greenplum Database mirroring schemes, see <xref
          href="../../best_practices/ha.html#topic_ngz_qf4_tt" format="html" scope="external"/>.</p>
      <p>For example, this is a <codeph>gpexpand</codeph> configuration file for a simple Greenplum
        system. The segment host <codeph>gp6s1</codeph> and <codeph>gp6s2</codeph> are configured
        with two NICs, <codeph>-s1</codeph> and <codeph>-s2</codeph>, where the Greenplum Database
        system uses each NIC as a host system. </p>
      <codeblock>gp6s1-s2|gp6s1-s2|40001|/data/data1/gpseg2|6|2|p
gp6s2-s1|gp6s2-s1|50000|/data/mirror1/gpseg2|9|2|m
gp6s2-s1|gp6s2-s1|40000|/data/data1/gpseg3|7|3|p
gp6s1-s2|gp6s1-s2|50001|/data/mirror1/gpseg3|8|3|m</codeblock>
    </section>
    <section id="section6">
      <title>Examples</title>
      <p>Run <codeph>gpexpand</codeph> with an input file to initialize new segments and create the
        expansion schema in the postgres database:</p>
      <codeblock>$ gpexpand -i input_file</codeblock>
      <p>Run <codeph>gpexpand</codeph> for sixty hours maximum duration to redistribute tables to
        new segments:</p>
      <codeblock>$ gpexpand -d 60:00:00</codeblock>
    </section>
    <section id="section7">
      <title>See Also</title>
      <p>
        <xref href="gpssh-exkeys.xml#topic1"/>, <xref
          href="../../admin_guide/expand/expand-main.html" format="html" scope="external">Expanding
          a Greenplum System</xref>
        <ph otherprops="op-print">in the <cite>Greenplum Database Administrator
        Guide</cite></ph></p>
    </section>
  </body>
</topic>
